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A METHOD FOR INTRODUCING CONJUGATED CAPS INTO MOLECULE 
FRAGMENTS AND SYSTEMS AND METHODS FOR USING THE SAME TO 
DETERMINE INTER-MOLECULAR INTERACTION ENERGIES 



5 [0001] This application claims benefit of U.S. Provisional Patent Application 

Serial No. 60/463,753, filed April 17, 2003, the disclosure of which is hereby 
incorporated by reference in its entirety. 



FIELD OF THE INVENTION 

10 

[0002] This invention relates to a method of introducing conjugated caps onto 

molecule fragments. After the molecule portions have been capped, the 
intermolecular interaction energy between the decomposed molecule and a second 
molecule can be calculated using the molecular portions. 

15 

BACKGROUND OF THE INVENTION 



[0003] A grand challenge in computational chemistry and biology is the 

accurate quantum mechanical calculation of interaction energies for molecules, 

20 especially larger biological molecules such as proteins. Due to a larger number of 
atoms, standard full quantum mechanical or ab initio calculation of intermolecular 
interaction energy is beyond computational reach. Currently, most theoretical studies 
of biological molecules employed classical force fields that are built on pair- wise 
atomic interaction potentials. Despite the success of classical force field methods in 

25 many applications, they still have significant limitations and quantum mechanical 
calculations of interaction energies are often required, e.g., in studying enzyme 
reactions. 

[0004] Recently, a popular approach to applying quantum mechanical 

calculation to biological molecules is the hybrid quantum mechanical/molecular 
30 mechanical (QM/MM) approach in which one combines quantum mechanical 

methods with molecular force fields for large molecules. In this hybrid QM/MM 
approach, one employs quantum mechanical or ab initio methods such as Hartree- 
Fock (HF) or density functional theory (DFT) methods to treat a small subsystem 
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while using molecular force fields to treat the larger part of the system such as solvent 
molecules. However, the QM/MM approach cannot provide a proper description of 
the interface between the QM and MM regions because QM and MM approach are 
inherently incompatible with each other. 
5 [0005] Currently, there are two basic approaches to solving this problem: the 

link atom approach or its variants and the local self-consistent field (LSCF) method, 
both of which use strictly localized bond orbitals for the bonds between QM and MM 
atoms. Despite the progress in these approaches in solving the interface problem, 
some artifacts still exists in applications of QM/MM methods. 

10 [0006] Another approach for calculation of large systems is the linear scaling 

approach in which the large system is divided into small subsystems and the 
calculation of the large system is performed for each subsystem individually. The 
linear scaling approach is based on the local property of the interaction because the 
effect of energy perturbation in one area is generally localized within its vicinity and 

15 decays rapidly going away from it. In this approach, the divide-and-conquer (DAC) 
and similar methods are commonly employed in theoretical calculations. Although 
these methods scale linearly with the size of the 2 system, applications are currently 
limited to calculations using semi-empirical methods for proteins. Ab initio 
calculations of biological molecules using HF or DFT methods are not feasible. 

20 [0007] There is thus a need for developing a practical and efficient full 

quantum mechanical method for calculating interaction energies of molecules such as 
proteins. This invention answers that need. 

SUMMARY OF THE INVENTION 

25 

[0008] A first embodiment of this invention relates to a method of introducing 

conjugated caps onto molecule fragments. In this method, a first molecule is 
provided. The molecule is then decomposed into two or more molecular fragments. 
One or more pairs of conjugated caps, which contain a first cap member and a second 
30 cap member, are introduced at one or more location in the molecule creating a 

plurality of molecular portions. Each molecular portion contains a fragment of the 
first molecule and at least one of the first and second cap members of the conjugated 
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caps. A further embodiment uses the molecular portions to calculate the interaction 
energy between the first molecule and a second molecule. 

[0009] A second embodiment of this invention relates to a computer-readable 

medium having stored instructions for calculating inter-molecular interaction energy 
5 between two molecules. The stored instructions comprise instructions for (a) 

providing a first molecule; (b) decomposing the molecule into two or more molecular 
fragments; and (c) introducing one or more pairs of conjugated caps having a first 
member and a second cap member at one or more locations in the molecule to create a 
plurality of molecular portions. Each molecular portion contains a fragment of the 
10 first molecule and at least one of the first and second cap members of the conjugated 
caps. 

[0010] A third embodiment of this invention relates to a system for calculating 

intermolecular interaction energy. The system contains (a) a molecular representation 
module that provides a first molecule; (b) a molecular decomposing module that 

1 5 decomposes the molecule into two or more molecular fragments; and (c) a molecular 
cap pair introduction module that introduces one or more pairs of conjugated caps 
having a first member and a second cap member at one or more locations in the 
molecule to create a plurality of molecular portions. Each molecular portion in the 
molecular representation contains a fragment of the first molecule and at least one of 

20 the first and second cap members of the conjugated caps. 

[0011] A fourth embodiment of this invention relates to a composition. The 

composition contains a molecule having a plurality of units and a plurality of pairs of 
conjugated caps having a first cap member and a second cap member. Each of the 
plurality of pairs of conjugated caps is inserted between two of the units under 

25 conditions effective to substantially preserve the properties of a chemical bond being 
cut to insert the pair of conjugated caps. The first cap member substantially mimics 
the electronic effect of the units of the molecule on a first side of the pair of 
conjugated caps and the second cap member substantially mimics the electronic effect 
of the units of the molecule on a second side of the pair of conjugated caps. 



30 



V 
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BRIEF DESCRIPTION OF THE DRAWINGS 

[0012] Fig. 1 is a graphical representation of an extended tripeptide and the 

locations of the cuts where conjugated caps are introduced. 
5 [0013] Fig. 2 is a block diagram of a computer system for practicing the 

method of the preferred embodiment. 

[0014] Fig. 3 is a flowchart depicting the method steps of the preferred 

embodiment. 

[0015] Fig. 4 represents the all-atom figure of three peptides: (a) Gly-Gly 

10 tripeptide; (b) Me-His-Ser-Me dipeptide with both terminals replaced by methyl 
groups; and (c) Gly-Ser- Ala-Asp- Val pentapeptide. 

[0016] Fig. 5 represents the coordinate system with the origin centered on the 

center-of-mass of Gly-Ser- Ala-Asp-Val. The interaction potential is calculated for 
the water molecule approaching the center-of-mass of the peptide from specified 
15 spherical angles (9, q>). 

[0017] Fig. 6 represents a comparison of ab initio and DFT calculations for 

triglycine/water interaction potential between the MFCC and FS (full system) 
calculations using different basis sizes. The approaching spherical angles of water are 
fixed at (90, 0). 

20 [0018] Fig. 7 represents one-dimensional (ID) potential curves for 

triglycine/water interaction at various directions obtained by MFCC and FS 
calculations using DFT B3LYP/6-31G. The solid line with dots are the FS result, 
dotted lines are MFCC results and dashed lines are the results from AMBER force 
fields. 

25 [0019] Fig. 8 represents ID potential curves for Me-His-Ser-Me/water 

interaction at various directions obtained by MFCC and FS calculations using DFT 
B3LYP/6-31G. The solid line with dots are the FS result and the dotted lines are 
MFCC results. 

[0020] Fig. 9 represents ID potential curves for Gly-Ser- Ala- Asp-Val/water 

30 interaction at various directions obtained by MFCC and FS calculations using HF/3- 
21G. The solid line with dots are the FS result, the dotted lines are MFCC results, and 
the dashed lines are the results from AMBER force fields. 
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[0021] Fig. 10 represents ID potential curves for Gly-Ser-Ala-Asp-Val/water 

interaction at various directions obtained by MFCC and FS calculations using DFT 
B3LYP/6-31G. The solid line with dots are the FS result and the dotted lines are 
MFCC results. 

5 [0022] Fig. 1 1 represents illustrative interaction paths between a water 

molecule (indicated by arrows) and the fixed structure HIV-1 gp41 protein. The 
distances is defined between the two atoms on both ends of the arrows except for Fig. 
9A where the distance is defined between the oxygen atom of the water and the 
center-of-mass of gp4 1 . 
10 [0023] Fig. 12 represents the ID (one-dimensional) gp41 -water interaction 

potential curves as a function of interaction path defined in Fig. 9. The solid circles 
are the results of ab initio calculations and dotted lines are the results from AMBER 
force field. The B3LYP/6-3 1G and MP2/6-3 1G results are denoted, respectively, by 
open circles and open squares in (B) and (D). 

15 

DETAILED DESCRIPTION 

[0024] The first embodiment of this invention relates to a method of 

introducing conjugated caps onto molecular fragments. A first molecule is provided, 

20 and then decomposed into two or more molecular fragments. Pairs of conjugated 

caps, containing a first cap member and second cap member, are introduced onto the 
molecular fragments at each decomposition point in the molecule. The pairs of caps 
are introduced in a manner that, after introduction, each molecular portion contains a 
molecule fragment and at least one cap member. Fig. 3 is a flowchart depicting the 

25 method steps of the preferred embodiment. 

[0025] When the molecule has been decomposed or cut, the individual pieces 

of the molecule are referred to as molecular fragments, whereas, after the molecular 
fragments have been capped, they are referred to as molecular portions. 
[0026] In the first step, a first molecule is provided. Preferably, the molecule 

30 is provided electronically, such as on a computer or other system that has the 

capability of executing modeling software. Other means capable of providing a 
molecular, graphical, or mathematical representation of the molecule may also be 



-6- 

used. For instance, the molecule may be provided as data plotted on a coordinate * 
system or other structural-information system that describes the molecule. 
Downloading information off the internet that describes the molecule, such as 
coordinate data about the molecule, is an example of providing the molecule. 
5 [0027] As is well known in the art, there are many different parties providing 

electronic downloadable information on molecules. The protein database bank at 
www.pdb.org is an example of one such party. The protein database bank website, as 
well as other sites, are able to provide the molecule as a set of coordinates, with each 
atom of the molecule having distinct coordinates. A sample set of x, y, and z 
10 coordinates obtained from the protein database bank for protein gb41 is provided 
below: 





ATOM 


1 


N 


26.801 


20.370 


-22.607 




ATOM 


2 


HI 


27.720 


20.763 


-22.465 


15 


ATOM 


3 


H2 


26.112 


21.023 


-22.263 




ATOM 


4 


H3 


26.740 


20.022 


-23.553 




ATOM 


5 


CA 


26.672 


19.167 


-21.736 




ATOM 


6 


HA 


25.835 


19.339 


-21.059 




ATOM 


7 


CB 


26.403 


17.903 


-22.573 


20 


ATOM 


8 


HB 


25.490 


18.038 


-23.152 




ATOM 


9 


CG2 


27.574 


17.662 


-23.520 




ATOM 


10 


1HG2 


28.488 


17.526 


-22.941 



[0028] The coordinates may then be graphically displayed using a plotting 

25 program that displays a molecular image of the molecule in a format such as that 
shown in Fig. 4 or 5. Any VRML compliant program or other program, such as 
Rasmal™, can be used to perform this function. While it is sometimes easier to 
conceptualize the features of the molecule when it is visually displayed, it is not 
necessary to display the molecule for the purposes of this invention. 
30 [0029] Other software and hardware that has the capability of generating 

molecules in electronic format or on a computer-readable medium are acceptable. 
Additionally, other non-electronic means of providing the molecule known to those of 
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skill in the art may also be used; e.g., providing one or more physical models of the 
molecules. 

[0030] The first molecule may be selected from any molecule known in the 

art. Preferably, the first molecule is a polyatomic species. Larger molecules such as 
5 materials, proteins, polymers, DNA, or RNA, are typically used as the first molecule 
because the calculations relating to intermolecular interaction energies are most useful 
for these types of molecules. However, the first molecule may also be a smaller 
molecule, such as an ion, a water molecule, an inorganic molecule, an organic 
molecule, a drug molecule, or a biological molecule. 

10 [0031] The first molecule is then decomposed into two or more molecular 

fragments. The molecule may be decomposed, i.e., cut, by any means known in the 
art. Preferably, the molecule is decomposed electronically, such as on a computer or 
via a molecular processing system. When the molecule is provided by means of 
structural information describing the molecule, the decomposition step may be 

1 5 effected by cutting the molecule into the desired molecular fragments based on the 
structural description. For instance, if the molecule is represented as a set of 
coordinates, with each atom of the molecule having distinct coordinates, the molecule 
can be decomposed by splitting the molecule at the coordinates corresponding to the 
decomposition points. A set of coordinates may be inputted into the system to 

20 designate the decomposition point or a molecular fragment occupying a set of 
coordinates. 

[0032] It is also within the capabilities of a skilled artisan to create a software 

program that decomposes a given molecule when specific decomposition points are 
provided. Appended to this disclosure is an example of source code of an executable 

25 computer program that can be used to decompose a molecule provided by means of a 
coordinate system. Different programs may be used and may be preferred depending 
on the hardware a user has at his or her disposal, the mechanism for providing such a 
program, and other factors determinable through routine experimentation. 
[0033] Cuts should be made across covalent bonds, preferably across covalent 

30 bonds that are single bonds. However, the cuts may be made across all types of 
bonds, including double and triple bonds. Cuts may also be made across ring 
structures, such as benzene rings. Since the amount of cuts correspond with amount 



of desired molecular fragments, a skilled artisan may choose to make many cuts or 
only a few cuts in the molecule depending on how many molecular fragments is 
deemed necessary or desirable. The amount of cuts and desired molecular fragments 
depends on the size of the molecule, the configuration of the molecule, the purpose of 
5 the cuts, and other factors that may be determined by routine experimentation by 
those of skill in the art. 

[0034] When discussing the decomposition of the molecule, it is useful to look 

at a theoretical example, such as the decomposition of a protein molecule P with N 
amino acids. This molecule can be represented at a given (fixed) structure as: 
10 P = nAi- A 2 - A 3 -...- A N c 

where Ai(i = 1 , . . N) are individual amino acid units, n is the N-terminal of the 
protein 

n = NH 3 + (NH 2 ) 

for the charged (neutral) N-terminal of the protein. The C-terminal of the protein is 
15 represented as 

A N c=R N CHCOCr (RnCHCOOH) 
for the charged (neutral) C-terminal. Figure 1 shows the sequence of a general 3- 
amino acid peptide (tripeptide) with charged terminals. 

[0035] As shown in Fig. 1, the cuts could take place between the carbon and 

20 nitrogen bonds for the tripeptide, as illustrated in that figure. In this case, the point of 
the cuts between the carbon and nitrogen represent the decomposition points for that 
molecule. Of course, cuts do not have to take place across all covalent bonds in the 
molecule. For instance, the cuts could also be made between the peptide bond for 
certain advantages or conveniences. As longs as the cuts are made in a manner that a 
25 cap may be introduced onto the molecule fragment at the decomposition point, it is 

not critical where the cuts are made in the molecule or how many cuts are made in the 
molecule. For larger molecules, many cuts will typically be made, resulting in many 
molecular fragments. For smaller molecules, only a few, perhaps only a single cut, 
may be needed. 

30 [0036] After the molecule has been cut, at least one pair of caps that are 

conjugate to each other is introduced at one or more points in the molecule. Caps 
may be introduced onto the molecular fragment by any means known in the art. 
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Preferably, the caps are introduced by electronically inserting the molecular cap at the 
decomposition points in the molecule. A molecular processing system may be used to 
introduce or insert the caps. 

[0037] For molecules provided by means of structural information describing 

5 the molecule, the molecular caps may be introduced based on structural information. 
If the molecule is represented by a set of coordinates identifying the atoms in the 
molecule, and decomposed at certain identified decomposition points, molecular caps 
may be introduced onto each molecular fragment by entering the coordinates of the 
atoms of the cap member at the desired composition point. The coordinates may then 

10 be converted to a visual representation through various modeling programs, such as 
VRML compliant programs to check the accuracy of the cap introduction. 
[0038] Each pair of caps contains two cap members, a first cap member and 

the second cap member. Pairs of conjugate caps are introduced into the molecule at 
each decomposition point. One cap member is introduced onto each molecule 

15 fragment so that the pairs of conjugate caps are aligned adjacent to one another. For 
illustrative purposes, the caps may be designated C'ap and where i equals 1 , . . . 
N. For example, in Fig. 1, cap C ! a P is used to terminate the right end of molecule 
fragment nAi at the first decomposition point, while its conjugated cap C^ap is 
employed to terminate the left end of molecule fragment A 2 ; similarly, cap C 2 ap is 

20 employed to terminate the right end of molecule fragment A 2 at the second 

decomposition point, while its conjugate cap C 2 * ap is used to terminate the left end of 
molecule fragment A 3 c. 

[0039] The caps should be introduced onto the molecule fragments so that 

each molecular portion will contain the molecule fragment and at least one cap 
25 member. In Fig. 1, the left-hand molecule portion contains molecule fragment nAi 
and cap member C l ap ; the middle molecule portion contains molecule fragment A 2 

1*2 

and cap members C ap and C ap ; the right-hand molecule portion contains molecule 
fragment A 3 c and cap member C 2 * ap . Thus, the right and left-hand molecule portions 
contain a molecule fragment and one cap member, while the middle molecule portion 
30 contains a molecule fragment and two cap members. 

[0040] Caps are atoms or radicals that bond with the fragment of the molecule 

that has been severed at the decomposition point. The caps serve two purposes. First, 
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they preserve the property of the valence bond being cut. The caps should preserve 
this property as closely as possible, serving a similar purpose as the link-atom in the 
QM/MM approach discussed above. Second, the caps should mimic as much as 
possible the effect of the original molecular part being cut away from the remaining 
5 fragment. For example, in Fig. 1, C^p should closely represent the electronic effect of 
everything to the right side of the first decomposition point, and C l * ap should closely 
represent the electronic effect of nAi on the A 2 unit. 

[0041] One skilled in the art may choose from various possible molecular caps 

when choosing suitable molecular caps using the criteria described above. This, of 
10 course, also applies for molecular caps C* a p and C^ap. In Fig. 1, the first cap C 1 a p 
could be NH + 3 (NH 2 ) for the charged (neutral) N-terminal. Other caps placed in the 
middle of the molecule may be, for example 

C^ap^ Ri+iC a H2 

for (i = 1, . . ., N-l). The right-end (C-terminal) cap is simply defined as 
1 5 A N C N ap = R N C a HCOO" (R N C a HCOOH) 

for the charged (neutral), C-terminal (cf Fig. 1). The corresponding conjugate caps 
are then 

C i * ap = NH 2 

for(i= 1, ...,N-1). 

20 [0042] After a molecule has been decomposed and capped with conjugated 

caps to create a plurality of molecular portions, the molecular portions may be used to 
measure intermolecular interaction energy. Intermolecular interaction energy 
calculations attempt to measure the transfer of energy between two given molecules. 
When calculating intermolecular interaction energy, at least two molecules are 

25 provided. In the method of this invention, the second molecule may be any molecule 
that one wishes to use in comparison to or in reference to the first molecule. 
Typically, the second molecule is a smaller molecule, such as an ion, a water 
molecule, an inorganic molecule, an organic molecule, a drug molecule, or a 
biological molecule. Water is perhaps the most common second molecule used in 

30 basic intermolecular-interaction-energy calculations. When calculating 

intermolecular interaction energies for proteins and peptides, drug molecules and 
biological molecules represent preferred second molecules because of the practical 
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benefit associated with protein inhibitors in drug discovery. However, any molecule 
may be used as the second molecule, included those listed as acceptable molecules for 
the first molecule. 

[0043] Once the first molecule has been decomposed into molecular 

5 fragments having conjugate caps, the intermolecular interaction energy between the 
first molecule and a second molecule can be calculated. While interaction energies 
may now be calculated, the use of molecules having molecular portions with 
conjugated caps is not limited to interaction-energy calculations. For instance, 
molecules having molecular portions with conjugated caps may also be used for 
10 calculations for determining the electron density of the molecules, dipole moment, 
electrostatic potential, and intra-molecular energy. 

[0044] In this calculation, the interaction energy is determined between each 

of the molecular portions in the first molecule and the second molecule. Interaction 
energy may be calculated using the well known full quantum mechanical or ab initio 

1 5 calculations. However other interaction energy calculations known to those of skill in 
the art may also be used. Preferably, software or hardware is used to make the 
calculations. The Gaussian™ software has the capability of performing full quantum 
mechanical interaction energies. This program may be obtained at 
www.gaussian.com . Each of these interaction energies is then added or summed 

20 together to provide a total interaction energy of the molecular portions. 

[0045] Likewise, the conjugated cap interaction energy may be determined for 

each pair of conjugated caps and the second molecule. The same interaction energy 
calculation used for determining the interaction energies of the molecular portions, 
e.g., the full quantum mechanical or ab initio calculations, should also be used for 

25 determining the interaction energies of the conjugated caps. The total interaction 
energy of the conjugated caps may then be determined by summing together all the 
conjugated cap interaction energies. 

[0046] Once the total interaction energy of the molecular portions and the 

total interaction energy of the conjugated caps have been calculated, the 
30 intermolecular interaction energy between the first molecule and the second molecule 
can be determined by subtracting the total interaction energy of the conjugated caps 
from the total interaction energy of the molecular portions. 
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[0047] In a preferred embodiment, a molecular interaction energy system is 

used to sum together the molecular portion and conjugated cap interaction energies 
and a interaction adjustment system is used to subtract the total interaction energy of 
the conjugated caps from the total interaction energy of the molecular portions. 
5 [0048] The calculation of interaction energies with this process, termed 

molecular fractionation with conjugate caps (MFCC) method, aims to provide 
accurate molecular interaction energies for molecules, especially large polyatomic 
molecules like protein, by means of full quantum mechanical electron structure 
calculations. By breaking the molecule into individual amino fragments that are 

10 properly capped, the interaction energy of a second molecule with the first molecule 
at a given structure can be obtained by proper combination of the interaction energies 
between the second molecule and individually capped protein fragments of the first 
molecule. The extra interactions between the second molecule and the introduced 
caps are canceled by subtracting the interaction between the molecule and the 

1 5 artificial molecules formed by conjugate caps. The MFCC scheme is particularly 

suitable for obtaining accurate ab initio interaction energies between a protein with a 
fixed structure and another molecule. The MFCC scheme is highly efficient for ab 
initio calculation and scales linearly with the size of the first molecule. In addition, 
since the interaction energies between the second molecule and individual molecule 

20 fragments of the first molecule can be calculated independently, it is particularly 
suitable for calculation on multi-node computer clusters. 

[0049] The basic approach of to the calculation interaction energy using 

MFCC is based on the hypothesis that first-molecule/second-molecule interaction 
energy is localized. While not wishing to be bound by this theory, it is believed that it 
25 is possible to accurately represent the interaction energy between the molecules as a 
sum over interactions between the second molecule and individual fragments of the 
first molecule. In this approach, the interaction of the second molecule with the first 
molecule involving simultaneous multi-fragment interactions are assumed to be 
negligible. 

30 [0050] Computing systems may be utilized to run the interaction energy 

calculations. Different computing systems or devices may be used for each 
calculation, or a single computing system may be used to run all the calculations 
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together. It may be preferable to use different computing system or device for each 
molecular portion. For example, a first computing system may be used to calculate 
the interaction energy between a first molecular portion and the second molecule, a 
second computing system may be used to calculate the interaction energy between a 
5 second molecular portion and the second molecule, and additional computing systems 
may be implemented for each additional molecular portion. More than one 
computing system or device may also be implemented for each portion of the 
molecule. The calculations may be performed on parallel or multi-processor 
computers or other systems known in the art. 

10 [0051] Fig. 2 illustrates computer system 100 in accordance with a preferred 

embodiment that can be used to accomplish the methods of the invention. Computer 
system 100 can include a variety of devices and can be embodied in a personal 
computer, workstation, or the like. The various devices can be coupled in any 
manner, such as over a LAN, WAN, or through other channels. Computer system 100 

15 includes user interface (UI) 100 which serves to provide all communications and 

interactions between computer system 100 and a user in a known manner. UI 100 can 
include a display and a keyboard or other input device. Further, UI 100 can include 
any necessary software and/or hardware interfaces for effecting the interface between 
the user and system 100 in a known manner. For example, UI 100 can include 

20 software to implement the standard WINDOWS™ user interface. 

[0052] Computer system 100 also includes processor 120, which can be any 

type of known processor, such as a PENTIUM IV™ , POWERPC™, or other 
processor. Processor 120 executes instructions stored as software code in memory 
device 130 and/or other memory devices. Memory device 130 includes a computer 

25 readable media, such as a hard disk, a CD, a DVD, a floppy disk, or any other type of 
media for storing computer readable instructions. Instructions are read from memory 
device 130 in a known manner to execute the instructions on processor 120. Of 
course, there can be other instructions, such as an operating system or the like, to 
facilitate execution of instructions stored on memory device 130. Also, memory 

30 device 130 can be constituted of plural devices or a single device. fc 

[0053] Memory device 130 includes molecule generation module 132 which 

provides instructions for selecting/generating, i.e., providing, a molecule in the 
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manner described above. For example, molecule generation module 132 can include 
known electronic downloadable databases, such as the protein database bank, located 
at www.pdb.org. Memory 130 also includes molecule decomposing module 134 for 
accomplishing the decomposing step described above. Molecule decomposing 
5 module 134 can also include program code designed to decompose molecules, such as 
the program code appended to this disclosure. Similarly, cap introduction module 136 
includes instructions for accomplishing the cap introduction step described above. 
This also can be accomplished using the same or different program code designed to 
decompose the molecule. Finally, energy calculation module 138 includes 

10 instructions for calculating the intermolecular interaction energy as described above. 
The programming steps required for accomplishing energy calculation module 138 
are well within the ability of a skilled programmer in light of the functional disclosure 
provided herein. Energy calculation module 138 can include energy calculation 
software such as the programs produced by Gaussian™. 

15 [0054] The second embodiment of this invention relates to a computer- 

readable medium, such as the medium of storage device 130, having stored 
instructions for calculating the intermolecular interaction energy between two 
molecules. When the instructions are executed by at least one processor, the 
execution causes the processor to perform the steps of (a) providing a first molecule; 

20 (b) decomposing the molecule into two or more molecular fragments; and (c) 

introducing one or more pairs of conjugated caps having a first member and a second 
cap member at one or more locations in the molecule to create a plurality of molecular 
portions. Each molecular portion contains a fragment of the first molecule and at 
least one of the first and second cap members of the conjugated caps. 

25 [0055] The computer-readable medium may also include instructions for 

calculating the intermolecular interaction energy between the first molecule and a 
second molecule. The processor may perform this function in addition to the 
functions described above during the same execution. The calculation of the 
interaction energy is the same as that described above in the first embodiment of the 

30 invention. 

[0056] The third embodiment of this invention relates to a system for 

calculating intermolecular interaction energy. The system contains (a) a molecular 



- 15- 



representation module that provides a first molecule; (b) a molecular decomposing 
module that decomposes the molecule into two or more molecular fragments; and (c) 
a molecular processing module that introduces one or more pairs of conjugated caps 
having a first member and a second cap member at one or more locations in the 
5 molecule to create a plurality of molecular portions. Each molecular portion in the 
molecular representation contains a fragment of the first molecule and at least one of 
the first and second cap members of the conjugated caps. 

[0057] The system for calculating intermolecular interaction energy may be a 

computer system. However, other systems that would perform similar functions are 

10 envisioned. A first molecule may be provided in the form a molecular, graphical, or 
mathematical representation. An example of a mathematical representation is a 
molecule defined by a set of coordinates (x, y, z), wherein each atom of the molecule 
occupies a distinct location in the coordinates. Molecular processing systems or 
modules perform the other functions of decomposing the molecule, introducing 

1 5 conjugate caps onto the molecule fragments, and determining or calculating the 

interaction energies. Utilizing molecular, graphical, or mathematical representations 
and molecular processing modules allows for the process to be performed 
electronically, which is the preferred means of execution. 

[0058] A fourth embodiment of this invention relates to a composition. The 

20 composition contains a molecule having a plurality of units, and a plurality of pairs of 
conjugated caps having a first cap member and a second cap member, wherein each of 
the plurality of pairs of conjugated caps has been inserted between two of the units 
under conditions effective to substantially preserve the properties of a chemical bond 
being cut to insert the pair of conjugated caps. The first cap member substantially 
25 mimics the electronic effect of the units of the molecule on a first side of the pair of 
conjugated caps and the second cap member substantially mimics the electronic effect 
of the units of the molecule on a second side of the pair of conjugated caps. The 
function of the cap members is discussed above in the disclosure relating to the 
selection of the cap members. 
30 [0059] The composition may be formed by the methods described above, i.e., 

the molecule may be provided, decomposed, and have conjugated caps introduced 
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onto the molecule fragments. This forms a composition wherein the molecule 
contains a plurality of units that are separate from one another. 
[0060] Alternatively, the pairs of conjugated caps may be fused or otherwise 

linked to one another to attach the molecular portions to one another in forming the 
5 composition. Since each cap member is a radical that lies adjacent to its conjugate, 
the conjugate cap members may fuse or form with each other through processes well 
known in the art. When the units are fused or linked together, the molecule can 
become a single continuous composition. 

[0061] While each of the four embodiments described above may be used to 

10 determine the intermolecular interaction energy between two molecules, the steps of 
providing a molecule, decomposing the molecule, and introducing conjugated caps 
onto the molecule fragments may also be used to determine electron density, dipole 
moment, electrostatic potential, and other calculations. The determination of the 
electron density p, dipole moment d, and electrostatic potential O follows essentially 
1 5 the same method steps as that for intermolecular energy disclosed above. 

Interaction energy calculations using a protein molecule 

[0062] This calculation of molecular fractionation with conjugated caps may 

be expressed abstractly using the protein molecule P with N amino acids example 

20 discussed above. Using V(M-P) to denote the interaction energy between the 

molecule M (the second molecule) and the protein P (the first molecule) with N amino 
groups, the above fractionation scheme is used to represent V(M-P) by 

V (M - P) = £ V (M - & l \ p AiCap) - £ V (M - C^pCV) (1) 
The first term V (M - C 1 " 1 *^ AiC'ap) in Eq. (1) represents the interaction energy 

25 between the molecule M and a capped protein fragment C 1 " 1 * ap AiC^p where both ends 
of the fragments A; are capped with covalent bonds. The second term in Eq. (1) is the 
interaction between the molecule M and an artificial molecule formed from conjugate 
caps Ami= C^ap-C 1 ^ • The calculated interaction energies are normalized by 
subtracting out the values at some asymptotic geometry. The geometries of the cap 

30 atoms are kept exactly the same in the calculation of both interaction energies in Eq. 
(1) to ensure that the artificial interactions between the molecule M and the caps are 
canceled. The energy given in Eq. (1) describes the proper inter-molecular energy 
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between the protein P with a fixed structure and the molecule M; it does not give the 
correct internal energy of the protein itself. 

[0063] Using Eq. (1), the interaction energy between a protein P and a 

molecule M can be obtained by simple summation over individual interaction 
5 energies between the molecule and the capped protein fragments that can be obtained 
by ab initio calculations such as HF, DFT or even higher level quantum chemistry 
methods. Obviously, the method scales linearly with the size of the protein. Since the 
calculation of the individual interaction energy in Eq. (1) is independent of each other, 
the method can be easily parallelized and is thus especially suitable for quantum 

10 calculation of interaction energies between proteins and, for example, drug molecules. 
However, the interactions between proteins and other molecules may easily be 
obtained using this method. Additionally, the method may be applied to other 
materials besides proteins, such as peptides, polymers, DNA, and RNA. 
[0064] The conjugate caps can then be coupled to form artificial molecular 

1 5 species whose interaction with the external molecule will be calculated to cancel out 
the artificial molecular interaction with individual caps. Thus the calculation of the 
original interaction energy between the molecule M and the protein P can be replaced 
by calculation of interaction energy between molecule M and individual protein 
fragments. The two protein fragments whose interactions with the molecule need to 

20 be calculated are the capped protein fragments having the molecular formula 

d^pAidap = NH 2 RiC a HCOHNRi +1 C a H 2 
and the coupled caps having the molecular formula 

C apC ap = NH 2 R i+1 C a H 2 
Since these fragments are relatively small molecules, the interaction energy between 

25 the M molecule and these small fragments can be calculated by ab initio methods with 
high efficiency. Since these individual interaction energies are calculated 
independent of each other, one can easily perform desired ab initio calculations on 
parallel or multi-processor computers to achieve greater real-time throughput. 
[0065] The atomic positions of the cap atoms should be exactly the same as 

30 that of the cutoff protein parts replaced by the caps. This avoids the possible artifacts 
due to the placement of atoms in empty space of configuration. 
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EXAMPLES 

[0066] The following examples make reference to the figures and results 

produced in those figures. The following examples and numerical tests are intended 
5 to illustrate, not limit, the invention. 

[00671 The above approach has been tested on a number of peptides 

interacting with a water molecule and the results of calculations are compared to the 
full system (FS) ab initio calculation. Three different peptides were chosen, as shown 
in Fig. 4. The first peptide is composed of three glycine (Gly-Gly-Gly) with charged 

10 terminals as shown in Fig. 4A. This peptide has a stretched structure whose energy 

was not optimized. The second peptide is composed of two amino acids but both ends 
are capped with the methyl group, i.e., Me-His-Ser-Me as shown in Fig. 4B. The 
structure of this peptide has been optimized using AMBER force field. The third 
example is a five-base peptide Gly-Ser-Ala-Asp-Val (SEQ ID NO. 1) whose structure 

15 has also been optimized using the force field. The interaction energies between these 
three fixed-structure peptides and a water molecule in gas phase were calculated. The 
MFCC results were compared with the corresponding full system ab initio 
calculations. All ab initio calculations were reported using the Gaussian98 package. 
[0068] No geometry optimization was done to find minimum energy 

20 structures of the peptide/water complex. Instead, different geometries along which 
the water molecule approaches the peptides were selected. Figure 5 shows the 
coordinate system in which the origin of the space-fixed coordinate system is at the 
center-of-mass of the Gly-Ser-Ala-Asp-Val peptide whose geometry is frozen. The 
water molecule approaches the center from different spherical angles (9, cp). Similar 

25 coordinate systems are used for the other two peptides. To minimize the number of 
coordinate changes, the water molecule stays rigid with its orientation shown in 
Fig. 5. along the potential curve to be calculated. 

[0069] Figure 6 shows ID potentials from ab initio calculations for the 

triglycine/water interaction in which the water molecule approaches the mass center 
30 of the peptide along the spherical angel (90, 0). In Fig. 6, the MFCC results 

calculated using HF and DFT methods with different basis sets were compared with 
the corresponding full system ab initio calculations. The results in Fig. 6 shows that 
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although there are sizable differences among different ab initio calculations with 
different methods and different basis sets, the MFCC results are in excellent 
agreement with the corresponding FS calculations across the board. For example, the 
HF calculation with a 3-21 G basis set gives a minimum energy which is about 5 
5 kcal/mol lower than that calculated using a 6-3 1G basis set. The results from 

DFT7B3LYP calculations using 6-3 1G and 6-3 1G* are very close to each other and 
lie somewhere between two sets of HF calculations. However, in all four sets of 
calculations, the MFCC results are in excellent agreement with that from the 
corresponding FS calculations. 

10 [0070] More results of calculations for the triglycine/water system at different 

geometries are shown in Fig. 7. Here the DFT B3LYP/6-3 1G method has been used 
for all ab initio calculations shown in Fig. 7 in both MFCC and full system 
calculations. In all the six geometries with different approaching spherical angles of 
water toward peptide, the MFCC results are in excellent agreement with the full 

15 system calculations, both in structures and energies of the interaction potential. The 
largest errors between the MFCC and full system ab initio calculations are less than 
0.5 kcal/mol in Fig. 7. 

[0071] The interaction energies obtained from AMBER force fields for 

triglycine/water system at the same geometries are also shown in Fig. 7. As shown, 

20 the force field gives some reasonable minimum energy positions at these geometries. 
However, the force field does not give accurate energies. For example, in the 
potential curve with the spherical angle (90, 0) in Fig. 7, the minimum energy given 
by the force field is only about 7 kcal/mol compared to the ab initio energy of 13 
kcal/mol. In another potential with the approaching angle of (120, 60) in Fig. 7, the 

25 well depth given by the force field is only about 0.3 kcal/mol compared to the ab 
initio calculation of 2.9 kcal/mol. Similar comparisons are seen for other potential 
curves in Fig. 7. Thus for the triglycine/water interaction, the force field generally 
gives energy minimums much higher than ab initio calculations. 
[0072] For the second system of Me-His-Ser-Me in Fig. 8, the dipeptide His- 

30 Ser has two methyl groups at the ends. The interaction potential energy curves are 
calculated for various approaching spherical angles of the water molecule toward the 
peptide. The comparison between the MFCC and full system calculations at 
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B3LYP/6-31G level is given in Fig. 8. The results in Fig. 8 show that both the 
structures and energies from the MFCC calculation are in excellent agreement with 
the results from the full system calculation. Even very shallow wells are accurately 
reproduced by the MFCC calculation. As shown in Fig. 7, the approaching angle of 
5 water at (130,10) exhibiting a well of less than 1 kcal/mol is accurately reproduced by 
the MFCC calculation. Both attractive and repulsive potentials are correctly 
reproduced by the MFCC calculation. 

[0073] The third system tested is a relatively larger peptide with five amino 

acids having the sequence: Gly-Ser-Ala-Asp-Val (SEQ ID NO. 1) with charged 

10 terminals. This pentapeptide was specifically chosen to include all three types of side 
chains: the polar (Ser), nonpolar (Ala and Val) and charged (Asp) side chains and 
glycine (Gly). In addition, both the N- and C-terminals are charged. This 
pentapeptide/water system has a total number of 62 atoms. Figure 9 shows various 
ID potential curves generated from ab initio calculations at the HF/3-21G level for 

15 different approaching angles of water. The agreement between the MFCC and full 
system ab initio calculations is generally very good for all potential curves as shown 
in Fig. 9. Both the structures of the potential curves and energies are quite well 
reproduced by MFCC calculations in all six cases. The largest deviation in energy 
from the full system calculation is about 0.5 kcal/mol in Fig. 9 for the approaching 

20 angle of (140, 200). Even the structure of a small bump of about 0.4 kcal/mol for the 
water approaching angle (30, 240) is faithfully reproduced as shown in Fig. 9. For 
purpose of comparison, the potential curves obtained from the force field are also 
shown in Fig. 9. Similar to the results of the triglycine in Fig. 7, the force field 
generally gives too shallow wells relative to ab initio calculations. Next, the DFT 

25 calculations were performed at the B3LYP/6-3 1G level for the same geometries of 
pentapeptide/water system; the results are shown in Fig. 10. Although there are 
differences in results between HF/3-21G and B3LYP/6-31G calculations, the MFCC 
calculation can reproduce the corresponding result of full system calculations using 
the same level of ab initio methods quite accurately as shown in Fig. 10. 

30 [0074] Figure 1 1 shows positions of part of the gp41 atoms surrounding the 

water molecule shown with arrows. The structure of gp41 is obtained from PDB 
(protein data bank) and is fixed throughout the calculation. The ab initio calculation 
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of the protein- water interaction is performed to generate ID potential curves by 
moving the water molecule with fixed orientation along the direction indicated by the 
arrows as indicated in Fig. 1 1 . Four different interaction paths were chosen as shown 
in Fig. 1 1 . These paths generally involve some form of hydrogen bonding or 
5 attractive interactions. The one-dimensional distances are defined as the distance 
between the two atoms at both ends of the arrows except for Fig. 1 1 A in which the 
distance is defined between the oxygen atom of water and the center-of-mass of the 
protein that lies along the direction of the arrow. The orientation of the water is fixed 
as it moves along the one-dimensional straight path to generate potential energies as a 

1 0 function of distance. 

[0075] The ab initio calculation is performed using Gaussian98 package. 

Quantum chemistry calculations were performed at several levels of theory, i.e., HF, 
DFT/B3LYP and MP2. Figure 12 shows calculated ID potential curves 
corresponding to the four different interaction paths illustrated in Fig. 1 1 . For 

15 comparison, the corresponding potential curves generated from AMBER force field 
were plotted in Fig. 12. Figure 12A shows the potential curve obtained by HF/3-21G 
calculation in which the x-coordinate is defined as the distance between the oxygen 
atom of the water and the center-of-mass of gp41 (shown in Fig. 1 1 A). Comparison 
of the ab initio potential curve with that from the force field in Fig. 12A shows that 

20 there are apparent differences between the two potential curves. The minimum 

position given by the force field is shifted outward by about 0.3 A in addition to some 
quantitative difference in energy scales. However, the HF/3-21G level of ab initio 
calculation employs a small basis size and include no electron correlation. While 
HF/3-21G typically gives good equilibrium geometry, its calculated energy may not 

25 be very accurate, as is well known to those in the field. 



[0076] Table 1. Calculated HIV-1 gp41 -water interaction energies (kcal/mol) 

at minimum positions in Fig. 12B and Fig. 12D using different quantum chemistry 
methods as well as the AMBER force field. 
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Methods 


2.85A 6 


2.00 A" 


6.30 A" 


Amber 


-3.54 


-2.44 


-4.35 


HF/3-21G 


-13.91 


-7.28 


-5.27 


HF/6-31G 


-7.78 






B3LYP/6-31G 


-11.77 


-6.27 


-6.31 


MP2/6-31G 


-10.48 


-4.63 


-6.36 



^Refers to the minimum position Fig. 12B. ^Refers to the minimum positions in Fig. 
12D. 

[0077] Figure 12B shows another computed potential curve corresponding to 

5 the interaction path depicted in Fig. 1 IB. Here four different levels of calculations 
were employed: HF/3-21G, HF/6-31G, B3YLP/6-31G and MP2/6-31G. These 
calculations show that the HF/3-21G result gives rather accurate equilibrium positions 
but tends to overestimate the hydrogen bonding strength to some extent. In 
comparison, the HF/6-3 1 G calculation gives bonding energy that seem to be too 

10 small. The more accurate calculation with B3LYP/6-3 1G, which includes electron 
correlation, gives bonding energy about 2 kcal/mol smaller than the HF/3-21G result 
and 4 kcal/mol larger than the HF/6-3 1G result (see Table 1). The MP2 calculation, 
also with 6-3 1G basis set, gives bonding energy about 1.3 kcal/mol smaller than the 
B3LYP energy. Based on conventional wisdom, the MP2 result is expected to be 

1 5 more reliable and trustworthy. 

[0078] The force field gives good equilibrium position in Fig. 12B, but it 

gives bonding energy which is about 7 kcal/mol smaller than the MP2 energy. A 
similar result is seen in Fig. 12C corresponding to the interaction path shown in Fig. 
11C. Here, the force field gives similar equilibrium position but underestimates the 

20 strength of hydrogen bonding, while the HF/3-21G is supposed to overestimate the 
bonding energy on the comparison in Fig. 12B. 

[0079] Figure 12D shows the computed potential curve corresponding to the 

interaction path illustrated in Fig. 1 ID. As shown, this potential curve has two wells. 
The HF/3-21G calculation, while giving excellent positions of the wells, gives a inner 
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well about 2 kcal/mol below the outer well (see Table 1). The B3LYP/6-3 1G 
calculation gives two well depths that are almost equal as can be seen in Fig. 12D and 
more clearly in Table 1. In comparison, the MP2/6-31G calculation, which is 
supposed to be more reliable, gives essentially the same well depth as the B3LYP 
5 calculation for the outer well. However, its calculated well depth for the inner well is 
about 1 .6 kcal/mol above that of the B3LYP result as shown in Table 1 and in 
Fig. 12D. Figure 12D also demonstrates that overall, the force field can qualitatively 
describe the interaction potential but not in a quantitative fashion. 
[0080] The MFCC method is particularly suited for ab initio calculation of 

10 protein-drug interaction. Currently existing docking programs that play important 
roles in fast screening of drug candidates rely almost exclusively on empirical 
molecular force fields to obtain interaction energies. The MFCC method makes full 
quantum mechanical or ab initio calculation of targeted protein-inhibitor interaction 
possible and computationally practical. This could lead to a quantum jump in the 

15 understanding, prediction, and design of protein inhibitors in drug discovery and in 
other areas of chemical biology. 

[0081] The computational cost is reduced using the MFCC method. In the 

numerical test, such as that performed in Fig. 9, a single point MFCC calculation 
using HF/3 -2 1G method for the Gly-Ser-Ala-Asp-Val/water interaction system (with 

20 62 atoms) takes about 2 minutes on a single processor Intel Pentium 1 .5 GH linux 
workstation. In Fig. 1 1, a single-point energy calculation of the gp41 -water 
interaction system (with 985 atoms) at the HF/3-21G level takes about 67 minutes on 
a Pentium 1 .5 GH PC running linux. With respect to correlated methods, the 
corresponding single point calculation takes about 516 and 518 minutes, respectively, 

25 using B3 LYP/6-3 1 G and MP2/6-3 1 G methods. In fact, the MP2 calculation does not 
take as much time as had been expected for large systems simply because each 
individual MP2 calculation involving a protein fragment is still relatively small 
despite the large size of the protein. This demonstrates that one could actually 
employ high level electron correlation methods to do practical calculations for 

30 protein-molecule interaction energies beyond HF and DFT methods. 

[0082] Because the computational cost of the MFCC method is linearly 

proportional to the number of amino acids, the ab initio calculations may be extended 
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straight to molecular interaction with real protein molecules with hundreds of amino 
acids. Thus the MFCC method makes full ab initio calculation of protein-molecular 
interaction energy practical even on personal computers. The ab initio calculation of 
the MFCC method can be easily parallelized to run on multi-node computer clusters 
in which individual fragments can be calculated simultaneously on separate 
computers. This can dramatically speed up the computation. For example, ab initio 
MFCC calculation for molecular interaction with a 200-residue protein on a 100-node 
clusters would take about about the same amount of time as the that for molecular 
interaction with a 2-residue peptide on a single-node computer. 
[0083] Full ab initio computation of interaction energies between a first 

molecule, such as a protein, and a second molecule, such as a water molecule, in 
which the entire system is included in the quantum mechanical treatment represents a 
new benchmark in extending quantum mechanical study to biological molecules. 
[0084] The MFCC scheme is of particular relevance to the quantum 

mechanical calculations of protein-drug interactions. The process has been applied to 
the calculation for streptavidin-biotin binding complex and has been used to design a 
compound that shows better binding to HIV-1 RT than the FDA-approved drug 
Nevirapine. 



